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The  Effects  of  Training  on  Raters'  Accuracy 
and  Cognitive  Categories 

Recently,  research  on  performance  appraisal  has  shifted  from  a 
focus  on  rating  formats  and  rating  errors  to  rating  process 
variables  and  rating  accuracy.  Considerable  attention  has  been 
devoted  to  explicating  the  cognitive  processes  of  raters  Involved 
in  performance  evaluation  (Landy  &  Farr,  1980;  Feldman,  1981;  Ilgen 
&  Feldman,  1983).  '.’he  rating  task  is  conceptualized  as  gathering, 

storing  and  recalling  information.  A  central  component  in  this 
process  is  the  storing  of  information  in  cognitive  categories  or 
"bins"  which  guide  attention  to  information  about  ratees  and  form 
the  basis  for  recall  of  that  information.  To  the  extent  that  the 
rater's  category  system  facilitates  attention  to  storage  and  recall 
of  relevant  ratee  behaviors,  performance  evaluations  should  be  more 
accurate  (Ilgen  &  Feldman,  1983).  Ostroff  and  Ilgen  (1985) 
demonstrated  that  these  cognitive  categories  of  raters  do  influence 
rating  accuracy.  This  suggests  that  one  way  to  Improve  performance 
ratings  is  to  direct  attention  to  improving  the  cognitive 
categories  of  raters. 

Training  programs  aimed  at  increasing  rater  accuracy  have 
typically  employed  two  types  of  training— "error"  training  and 
"accuracy"  training.  Although  attempts  to  train  raters  to  avoid 
common  psychometric  errors  were  successful  in  doing  so  (Bernardin  & 
Walter,  1977;  Borman,  1975;  Ivancevich,  1979;  Latham,  Wexley,  & 
Pursell,  1975),  subsequent  studies  demonstrated  that  accuracy  was 
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relatively  unaffected  by  the  reduction  of  psychometric  errors  in 
ratings  (Bernardin  &  Pence,  1980;  Borman,  1975,  1979;  Pulakos, 
1984).  To  correct  this  problem,  an  alternative  approach  to  error 
training  was  proposed  by  Bernardin  and  Buckley  (1981)  and  Borman 
(1979)  in  which  raters  are  trained  to  use  a  common  frame-of- 
reference  to  assess  ratee  behaviors.  Studies  utilizing  this 
"accuracy"  approach  typically  concentrate  on  the  a) 
multidimensionality  of  performance,  b)  specification  of  rating 
scale  dimensions  and  the  behaviors  comprising  these  dimensions,  and 
c)  recognizing  possible  discrepancies  between  "true”  ratings  and 
the  rater  trainee  ratings  (cf.  Bernardin  &  Pence,  1980;  McIntyre, 
Smith  &  Haslett,  1984;  Pulakos,  1984,  in  press).  All  studies  found 
that  rating  accuracy  Improves  with  such  training. 

Two  sets  of  Indirect  evidence  suggest  that  accuracy  training 
alters  the  cognitive  dimensions  people  use  to  assess  performance 
and  that  these  new  cognitive  dimensions  lead  to  more  accurate 
ratings.  First,  accuracy  training,  which  focuses  on  describing  the 
performance  dimensions  used  on  the  rating  scale,  increases  accuracy 
presumedly  by  bringing  the  rater's  categories  for  judging 
performance  in  line  with  the  scale  dimensions.  Whether  the 
categories  used  by  the  rater  are  actually  more  consistent  with  the 
scale  after  training  has  not  been  tested.  Second,  those  who  have 
naturally  occuring  categories  which  are  consistent  with  the  scale 
tend  to  be  more  accurate  than  those  whose  categories  are  more 
discrepant  from  the  scale  (Ostroff  &  Ilgen,  1985).  Taken  together, 
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Indirect  evidence  is  provided  for  the  sequence  of  training 
affecting  categories  which  affect  accuracy.  The  data  support  links 
two  and  three  in  the  following  sequence. 

1  2 

Accuracy - ^  Rater  Category— Rating 

Training  Systems  Accuracy 

I _ 1 _ ^ 

One  purpose  of  the  present  study  was  to  develop  a  rater 
training  program  directed  at  the  categories  raters  used  in 
evaluating  performance.  In  this  program,  labeled  feedback  training 
program,  raters'  own  cognitive  categories  were  assessed,  feedback 
about  the  match  between  raters'  own  categories  and  the  categories 
relevant  to  the  job  was  provided,  and  the  effects  of  a  good  match 
of  categories  to  rating  scales  was  discussed.  Specifically,  raters 
were  given  feedback  as  to  how  well  their  categories  matched  rating 
scale  dimensions,  the  extent  to  which  they  distinguished  between 
job  relevant  and  Irrelevant  behaviors  and  dimensions,  and  the 
degree  to  which  they  differentiated  among  dimensions.  It  was 
believed  that  this  more  direct  approach  to  linking  the  rater's 
categories  to  performance  appraisals  in  a  training  program  would  be 
more  useful  in  enhancing  rater  accuracy  than  the  more  general 
accuracy  training  approaches  previously  used.  The  following 
hypotheses  were  tested: 

Hypothesis  1A:  Providing  raters  with  standard  rater 
accuracy  training  or  with  feedback  training  will  Improve 


rater  acccuracy. 
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Hypothesis  2A:  Raters  receiving  feedback  training  will 
show  a  greater  improvement  in  rating  accuracy  than  those 
provided  with  standard  rater  accuracy  training. 

A  second  purpose  of  this  study  was  to  determine  if  training 
does  Indeed  affect  the  rater's  cognitive  categories  as  assumed. 
Although  previous  rater  training  programs  assumed  that  the 
cognitive  categories  of  raters  are  important  and  are  affected  by 
training,  no  research  has  addressed  this  issue  directly.  To  the 
degree  that  the  training  program  focuses  directly  on  the  rater's 
own  cognitive  categories,  the  training  should  have  a  greater 
Influence  on  the  rater's  cognitive  categories  than  a  more  general 
training  program.  This  implies  that: 

Hypothesis  2A:  Both  rater  accuracy  training  and  feedback 
training  trill  affect  raters'  cognitive  categories. 

Hypothesis  2B:  Providing  raters  with  feedback  training 
will  affect  raters'  cognitive  categories  to  a  greater 
extent  than  the  receipt  of  accuracy  training  without  feedback. 

Method 

Overview 

The  research  required  nurses'  participation  in  three  separate 
phases.  The  first  phase  Involved  an  orientation  session  where 
nurses  completed  the  pre-training  research  measures.  In  the  second 
phase,  nurses  particpated  in  one  of  two  training  programs.  Both 
Phase  One  and  Phase  Two  were  conducted  at  the  hospitals  where  the 
nurses  were  employed.  Finally,  in  Phase  Three,  nurses  responded  by 
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mail  to  the  post-training  research  measures. 

Sample 

Participants  in  Phase  One  were  125  nurses  (97%  female)  from 
three  large  midwestern  hospitals.  Of  these,  71  also  participated 
in  Phase  Two  and  S3  participated  in  all  three  phases.  From  the 
total  sample,  92%  of  the  nurses  had  five  or  more  years  of  work 
experience  and  87%  had  previous  experience  rating  nurses.  Ninety- 
seven  percent  of  the  persons  were  in  some  type  of  supervisory 
position. 

Stimulus  Materials  and  Measures 

Two  sets  of  measures  were  used  in  the  study.  The  Behavior 
Grid  assessed  raters'  cognitive  categories.  To  assess  rating 
accuracy,  a  videotape  of  a  nurse  performing  Job  duties,  a 
performance  rating  scale  and  true  score  ratings  were  used.  (For  a 
more  complete  description  of  the  development  and  reliability  of 
these  measures,  see  Ostroff  &  Ilgen,  1985). 

Behavior  Grid.  The  Behavior  Grid  was  a  matrix  which  contained 
brief  descriptions  of  behaviors  as  rows  (e.g. ,  "would  expect  this 
nurse  to  give  only  a  partial  bath  to  an  acutely  ill  cardiac  patient 
in  an  oxygen  tent")  and  performance  dimensions  as  columns  (e.g.  , 
"Knowledge  and  Judgment”).  A  brief  definition  of  each  dimension 
was  also  provided.  In  addition,  one-half  of  the  rows  were 
behaviors  judged  to  be  relevant  to  job  performance  and  the  other 
half  irrelevant.  The  same  distinction  was  made  for  the  dimensions 
(columns).  For  example,  an  irrelevant  behavior  was  "this  nurse 
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wears  a  lot  of  make-up,  perfume  of  cologne  to  work"  and  an 
irrelevant  dimension  was  "Appearance.” 

The  grid  itself  was  composed  of  empty  cells  with  each  row  and 
column  labeled  as  described  above  and  with  job  relevant  and 
irrelevant  rows  and  columns  randomly  ordered.  To  complete  the 
grid,  a  nurse  read  each  behavioral  label  for  the  row,  then  placed 
an  "X"  under  the  dimension  column  or  columns  that  he  or  she  felt 
the  behavior  represented.  Given  the  examples  mentioned  above,  a 
correct  placement  of  the  "giving  a  partial  bath”  behavior  was  under 
the  "Knowledge  and  Judgment”  dimension. 

Cognitive  Measures.  From  the  Behavior  Grid,  the  following  six 
measures  were  derived: 

1.  Rating  Scale  Match 

From  the  subset  of  dimensions  on  the  Grid  which  were 
identified  a  priori  as  relevant  to  the  nurse's  job  and  a 
subset  of  behaviors  which  described  those  dimensions,  an 
index  of  the  match  between  the  job  and  the  rater's 
perception  of  dimensions  and  behaviors  was  derived.  Each 
job  relevant  behavior  received  a  score  ranging  from  6  to 
1,  depending  on  the  degree  of  match  to  the  rating  scale. 
The  highest  score  was  given  when  an  "X"  appeared  in  the 
appropriate  column  for  the  behavior  and  in  no  other 
columns.  The  next  highest  score  was  for  an  "X"  placed  in 
the  appropriate  column  and  also  in  one  other  column.  The 
scores  continued  to  decrease  in  a  similar  fashion. 
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depending  upon  the  nature  of  the  response.  The  scores 
for  each  behavior  were  totalled  so  that  the  Rating  Match 
scores  ranged  from  20  to  120.  Higher  scores  indicated  a 
greater  match. 

2.  Non-Job  Relevant  Behavior  Classification 

This  index  was  the  sum  of  the  number  of  times 
behaviors  identified  as  non-job  relevant  were 
misclassifled  as  belonging  to  job  related  dimensions. 

3.  Job  Relevant  Behavior  Classification 

In  a  manner  similar  to  2  above,  the  number  of  times 
job  relevant  behaviors  were  sorted  into  non- job  relevant 
dimensions  was  tallied. 

4.  Overall  Cognitive  Differentiation 

This  index  was  computed  by  totalling  the  number  of 
check  marks  (or  number  of  times  behaviors  were  placed  in 
dimensions)  each  rater  placed  in  the  grid.  Lower  scores 
indicated  greater  differentiation. 

5.  Job  Behavior  Cognitive  Differentiation 

This  index  was  computed  in  a  manner  similar  to  4 
above,  but  only  for  the  job  related  behaviors  in  the 
grid. 

6.  Non-Job  Cognitive  Differentiation 

In  a  manner  similar  to  4  above,  the  number  of  check 
marks  each  rater  placed  in  the  grid  for  non-job  related 


behaviors  was  tallied. 
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Videotape  and  Rating  Scale.  A  25  minute  videotape  featuring  a 
nurse  in  a  hospital  setting  served  as  the  stimulus  material  for 
ratings.  It  featured  18  one  to  three  minute  scenes  depicting 
enactments  of  job  behaviors  from  one  or  more  of  the  five 
performance  dimensions  to  be  rated  on  the  performance  evaluation 
scale.  Within  each  dimension,  the  ratee's  behavior  was  designed  to 
be  consistent  in  effectiveness  level,  but  across  job  dimensions, 
the  effectiveness  level  varied.  True  scores  were  also  generated 
from  expert  raters.  These  ratings  served  as  the  standard  to  which 
subjects'  ratings  were  compared  and  from  which  performance  accuracy 
indices  were  computed. 

Ratings  of  the  nurse's  performance  were  made  using  Smith  and 
Kendall's  (1963)  behavioral ly  anchored  rating  scale  (BARS) 
developed  specifically  for  hospital  nurses.  The  five  dimensions  on 
the  BARS  were  Knowledge  and  Judgment,  Organizational  Ability,  Skill 
in  Human  Relations,  Conscientiousness  and  Observational  Ability. 

Accuracy  Measures.  Two  accuracy  measures  were  calculated  for 
each  rater  and  served  as  dependent  variables.  Cronbach's  (1955) 
component  of  overall  accuracy  was  computed  by  squaring  the 
difference  between  the  rated  and  true  scores  and  summing  over  all 
dimensions.  Lower  overall  accuracy  scores  indicated  greater 
accuracy.  Correlational  accuracy  was  also  computed  for  each  rater 
by  correlating  the  true  scores  and  the  observed  scores.  Higher 
correlational  accuracy  scores  indicated  greater  accuracy  in  terms 
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of  the  pattern  of  performance  levels  across  dimensions  for  the 
ratee . 

Procedure 

For  the  first  phase,  pre-training,  nurses  participated  In  a 
one  and  one-half  hour  long  session  and  were  assessed  in  groups  of 
three  to  fifty  persons  per  session.  After  a  brief  description  of 
the  project,  nurses  completed  several  questionnaires.^*  Next,  they 
completed  the  Behavior  Grid.  Once  completed,  the  questionnaires 
were  collected.  This  was  followed  by  an  explanation  of  the  rating 
scales  and  the  videotape.  Nurses  then  viewed  the  videotape  and 
rated  the  performance  of  the  videotaped  nurse  on  the  BARS  scale. 

Phase  Two,  training,  was  conducted  approximately  4-6  weeks 
later.  Raters  again  participated  in  a  one  and  one-half  hour  long 
session  in  which  they  received  one  of  the  treatment  (training) 
programs.  Hospitals  were  randomly  assigned  to  treatment  groups. 
Immediately  following  training,  nurses  again  observed  and  rated  the 
videotaped  nurse  using  the  BARS  scale. 

Approximately  4-6  weeks  following  the  training  phase,  nurses 
were  mailed  the  Behavior  Grid  and  were  again  asked  to  complete  the 
grid  following  the  procedure  described  in  Phase  One.  Seventy-seven 
percent  of  the  nurses,  who  had  volunteered  during  Phase  Two  to 
complete  the  final  questionnaire,  returned  the  completed  measure. 
The  third  phase  will  be  referred  to  as  the  post-training  session. 

In  sum,  accuracy  was  assessed  during  the  first  and  second  phases 
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and  the  cognitive  measures  were  assessed  during  the  first  and 
third . 

Manipulations 

Two  types  of  rater  trailing  programs  were  employed  in  the 
study-rater  accuracy  training  and  feedback  training.  All  nurses 
received  a  brief  explanation  of  the  session,  namely  a  description 
of  its  goal  to  increase  the  accuracy  of  their  evaluations  of 
others'  performance.  A  brief  description  of  the  rating  process  as 
one  of  coding,  storing  and  recalling  information  about  ratees  was 
presented  emphasizing  the  storing  of  information  in  categories. 

The  pre-training  session,  which  included  the  same  rating  task, 
served  as  the  no-training  control. 

Rater  Accuracy  Training.  Accuracy  training  was  designed  to 
provide  raters  with  a  common  frame-of-reference  for  considering 
ratee  performance.  This  session  was  based,  in  part,  on  the 
procedure  outlined  by  Pulakos  (1984)  for  accuracy  training.  Nurses 
were  first  lectured  on  the  multidimensionality  of  the  job  and  on 
the  Importance  of  attending  to  performance  related  to  these  job 
dimensions.  Participants  were  then  given  the  BARS  rating  form. 
Global  defintions  of  each  dimension  were  given  by  the  trainer, 
followed  by  an  indepth  description  of  the  behaviors  comprising  the 
dimensions.  The  types  of  behaviors  indicative  of  various 
effectiveness  levels  within  each  dimension  were  discussed  by 
pointing  out  differences  in  the  effective  versus  ineffective 
behaviors  which  served  as  scale  anchors.  Nurses  then  practiced 


Rater  Training  -  13 


using  the  scales  by  rating  a  short  sample  of  three,  one  to  three 
minute,  videotaped  scenes. 

Next,  a  random  sample  of  nurses'  ratings  were  placed  on  an 
overhead,  by  dimension,  and  the  group  discussed  what  particular 
ratee  behaviors  led  them  to  their  ratings.  The  trainer  also 
provided  feedback  on  the  accuracy  of  their  practice  ratings.  Next, 
common  rating  errors  (halo,  central  tendency,  leniency,  constrast, 
first  impression,  simllax-to-me  and  stereotype)  were  explained  and 
the  trainer  pointed  out  examples  of  such  errors  in  the  practice 
ratings.  Finally,  participants  viewed  a  second  sample  of  videotape 
and  again  practiced  making  ratings.  Group  discussions  and  feedback 
on  their  accuracy  followed  as  described  for  the  first  practice 
sample.  An  overview  of  the  session  ended  the  training  program. 

Feedback  Training.  The  feedback  training  session  was  designed 
to  Incorporate  specific  feedback  to  raters  on  the  cognitive 
categories  they  had  used  in  evaluating  others  in  the  earlier 
session.  The  Importance  of  a)  focusing  on  specific  behaviors 
rather  than  general  traits,  b)  distinguishing  between  job  relevant 
and  non-job  relevant  behaviors,  c)  defining  the  appropriate 
behaviors  for  particular  job  dimensions,  and  d)  differentiating 
between  job  dimensions  were  highlighted  in  the  lecture.  Each  rater 
received  a  feedback  form  with  scores  derived  from  the  Behavior  Grid 
which  they  completed  during  Phase  One.  The  trainer  first  explained 
the  distinction  between  job  behaviors  and  more  general  personal 
characteristics  of  ratees,  following  which  an  explanation  of  job 
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versus  non-job  related  behaviors  and  dimensions  ensued.  Raters 
were  directed  to  their  feedback  form  to  determine  the  percent  of 
non-job  related  behaviors  they  had  perceived  as  belonging  to  each 
of  the  five  job  dimensions  (from  the  BARS)  and  the  percent  of  job 
relevant  behaviors  they  placed  in  dimensions  irrelevant  to  job 
performance.  It  was  explained  to  trainees  that  misclassifying  the 
behaviors  in  such  a  way  could  lead  to  erroneous  judgments  of 
performance.  Discussion  then  focused  on  job  relevant  behaviors  and 
the  importance  of  observing  and  defining  the  appropriate  behaviors 
for  each  of  the  five  performance  dimensions.  Thus,  feedback  was 
provided,  for  each  job  dimension,  as  to  the  percent  of  job 
behaviors  correctly  placed  in  the  appropriate  dimension.  The 
trainer  also  explained  the  importance  of  differentiating  between 
job  dimensions  as  oppposed  to  viewing  every  Job  behavior  as 
belonging  to  every  dimension.  Raters  then  received  feedback  as  to 
whether  they  were  low,  average  or  high  in  differentiating  among 
dimensions  and  were  told  that  if  they  received  low  or  average 
scores,  they  need  to  focus  on  distinguishing  between  which 
behaviors  belong  in  which  dimensions.  Raters  were  also  instructed 
to  pay  particular  attention,  in  the  remainder  of  the  session,  to 
those  dimensions  for  which  they  received  low  scores.  Following  the 
feedback  discussion,  the  trainer  provided  accuracy  and  error 
training  by  following  the  procedure  described  above  for  accuracy 
training.  However,  in  order  to  keep  the  length  of  the  two  training 
programs  the  same,  only  one  practice  and  subsequent  discussion, 
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rather  than  the  tvo  given  In  the  accuracy  session,  was  given  to 
raters  in  feedback  training. 

Experimental  Design 

The  hypotheses  were  tested  with  2x2  factorial  designs  with 
repeated  measures  on  the  second  factor.  The  first  factor  was 
Training  (Accuracy  versus  Feedback).  The  repeated  measure  factor 
was  two  levels  of  training  Experience  (low  experience  for  pre¬ 
training  and  high  experience  for  post-training). 

Results 

Training  Effects  on  Accuracy 

The  means  and  standard  deviations  of  the  two  accuracy 
measures,  overall  and  correlational  accuracy  are  reported  in  Table 
1.  Within  experience  level,  the  two  accuracy  measures  were  highly 
lntercorrelated  (£  *  .68  for  low  experience  and  £  *  .76  for  high 
experience) . 

A  2  x  2  (training  x  experience)  ANOVA  with  repeated  measures  - 
on  the  second  factor  was  performed  to  assess  training  and 
experience  effects  on  overall  accuracy.  Results  indicated  no 
significant  main  effect  for  training,  F(l,69)*.00,  £"1.0,  but  a 
significant  main  effect  resulted  for  experience,  F(l,69)*17.14, 
£■.0001.  Mean  comparisons  using  Newman-Keuls  tests  revealed  that 
raters  were  more  accurate,  measured  by  overall  accuracy,  after  high 
experience  than  low  experience.  No  training  x  experience 
interactions  were  found,  F(l,69)*.08,  £*.78. 


Correlational  Accurac 


weans  ana  stanaard  Deviations  or  Accuracy  Measure: 
and  Training 


Overall  Accuracy 


Experience 

Accuracy 

Feedback 

Total 

Accuracy 

Feedback 

Total 

Low 

M 

.77 

.75 

.76 

.69 

.73 

.71 

SD 

.74 

.50 

.63 

.22 

.11 

.18 

N 

38 

33 

71 

37 

33 

70 

High 

M 

.44 

.46 

.45 

.73 

.75 

.74 

SD 

.32 

.34 

.32 

.10 

.06 

.09 

N 

38 

33 

71 

37 

33 

70 

Totals 

M 

.61 

.60 

.60 

.71 

.74 

.72 

SD 

.59 

.45 

.28 

.17 

.09 

.14 

N 

76 

66 

142 

74 

66 

140 
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With  respect  to  Hypothesis  One,  recall  that  the  pre-training 
accuracy  scores  (assessed  In  the  first  session,  i.e.  low 
experience)  were  derived  after  raters  were  given  some  orientation 
to  appraisal  processes.  During  the  low  experience,  pre-training 
session,  raters  may  have  gained  some  information,  on  their  own, 
simply  as  a  result  of  experience  with  the  task,  that  served  to 
enhance  their  rating  accuracy.  If  so,  the  training  sessions  may 
have  enhanced  rating  accuracy  for  both  types  of  training,  but  may 
not  have  supplemented  this  Information  enough  to  reflect 
differences  in  the  training  programs.  That  is,  the  experience 
provided  by  the  first  exposure  to  the  task  may  have  provided  a  base 
upon  which  improvements  occured  for  the  second  session,  but  there 
may  not  have  been  sufficient  room  for  improvement  to  detect 
differences  between  treatments. 

To  determine  if  participation  in  both  low  and  high  experience 
sessions  affected  accuracy  scores  in  the  training  programs 
differently,  we  identified  those  who  had  not  been  present  at  the 
first  phase  of  research  but  had  attended  the  training  session 
(n-24).  This  group  was  labeled  the  low  participation  group  in  a  2  x 
2  factorial  design  with  two  levels  of  participation  and  two  levels 
of  training,  using  only  data  from  Phase  Two.  The  means  and 
standard  deviations  for  the  accuracy  measures  by  particpation  are 
presented  in  Table  2.  For  overall  accuracy  scores,  results 
indicated  no  significant  main  effect  for  training,  F(l,91)».46, 
£*.5.  A  significant  main  effect  was  found  for  particpation. 
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F(l,91)a4.10,  2^.05,  and  there  was  a  trend  toward  a  training  x 
participation  interaction  F(l,91)-3.16,  j>^.0$.  Mean  comparisons 
using  Newman-Keuls  tests  revealed  that  raters  who  received  Accuracy 
training  and  who  particpated  only  in  the  training  session  were  less 
accurate  than  those  raters  in  any  other  group. 

All  of  the  above  analyses  were  performed  for  the  correlational 
accuracy  index,  however  no  significant  results  were  revealed.  This 
was  probably  due  to  the  fact  that  there  was  little  variance  in  the 
correlational  accuracy  scores  across  the  sample. 

Training  Effects  on  Cognitive  Categories 

The  cognitive  variables  measured  in  the  first  and  third  phase 
were  intercorrelated  and  are  reported  in  Table  3.  Within  session, 
there  were  high  intercorrelations  between  the  cognitive  measures. 
The  low  correlations  across  sessions  (r's  ranged  from  .00  to  .14) 
Implied  that  the  raters'  scores  on  the  cognitive  measures  changed 
across  over  time. 

To  test  for  training  effects  on  raters'  cognitive  categories, 
a  2  x  2  (Training  x  Experience)  multivariate  analysis  of  variance 
with  training  as  the  fixed  factor  and  experience  as  the  repeated 
measure  was  performed  including  all  six  cognitive  measures  as 
dependent  variables.  Results  of  this  MANOVA  revealed  no 
significant  main  effect  for  training,  F(5,47)a.36,  £*.87,  but  a 
significant  main  effect  for  experience,  F(5,47)a13.32,  £^.001,  and 
a  significant  training  x  experience  interaction,  F(5,47)«3.08, 
p-,02.  Due  to  the  significant  main  and  interaction  effects,  2x2 


Table  2 


Rater  Training  -  19 


Means  and  Standard  Deviations  of  Accuracy  Measures  by  Participation 
and  Training 


Overall  Accuracy _  Correlational  Accuracy 


Participation 

Accuracy 

Feedback 

Total 

Accuracy 

Feedback 

Total 

Low 

M 

.76 

.39 

.67 

.68 

.77 

.71 

SD 

.16 

.13 

.13 

.05 

.01 

.04 

N 

18 

6 

24 

18 

6 

24 

High 

M 

.44 

.46 

.45 

.73 

.75 

.74 

SD 

.32 

.34 

.32 

-  .10 

.06 

.09 

N 

38 

33 

71 

37 

33 

70 

Totals 

M 

.54 

.45 

.50 

.71 

.75 

.73 

SD 

.48 

.33 

.43 

.14 

.06 

.11 

N 

56 

39 

95 

55 

39 

94 

Note.  Low  participation  indicates  attendance  at  only  Phase  Two.  High 
participation  is  attendance  at  both  Phase  One  and  Two. 
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Table  3 

Intercorrelations  of  Cognitive  Measures  Obtained  from  Nurses  Who 
Attended  All  Three  Phases 


Pre-Training 


Cognitive  Measure 

1 

2 

3 

4 

1.  Rating  Scale  Match 

— 

Behavior  Classification 

2.  Non- Job 

-.53 

3.  Job 

-.50 

.63 

Cognitive  Differentiation 

4.  Overall 

-.65 

.84 

.72 

5.  Job 

-.67 

.80 

.74 

.96 

6 .  Non-Job 

-.56 

.80 

.61 

.94 

Post-Training 


Cognitive  Measure 

1_ 

2 

3 

4 

1.  Rating  Scale  Match 

— 

Behavior  Classification 

2 .  Non- Job 

-.36 

3.  Job 

-.24 

.45 

Cognitive  Differentiation 

4 .  Overall 

-.34 

.73 

.61 

5.  Job 

-.41 

.62 

.60 

.97 

6 .  Non-Job 

-.18 

.80 

.54 

.90 

75 
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(training  x  experience)  ANOVAs  with  repeated  measures  on  the  second 
factor  were  performed  for  each  of  the  six  dependent  variables  to 
determine  which  of  the  cognitive  measures  contributed  to  the 
significant  effects. 

The  means  and  standard  deviations  for  the  cognitive  measures 
as  affected  by  training  and  experience  are  presented  in  Table  4  and 
the  results  of  the  ANOVAs  for  the  six  cognitive  measures  appear  in 
Table  5.  As  can  be  seen  from  Table  5,  the  primary  effect  on 
cognitive  variables  resulted  from  increases  in  exposure  to  the 
rating  task  and  training  from  the  first  to  the  third  phase.  All 
six  measures  changed  significantly  and  this  change  accounted  for  an 
average  of  11%  of  the  variance  based  upon  the  mean  of  the  Omega 
squares  for  experience  across  the  six  variables.  Inspection  of  the 
patterns  of  the  means  for  the  marginally  significant  interactions 
indicated  that  these  did  not  alter  the  nature  of  the  changes 
resulting  from  experience.  In  all  cases,  the  shifts  were  toward 
Improvement  in  the  cognitive  responses. 

The  interaction  data  were  less  clear  cut.  It  was  hypothesized 
that  experience  would  lead  to  changes  in  the  cognitive  variables, 
but  that  those  who  received  feedback  training  would  change 
(improve)  to  a  greater  extent,  after  training,  than  those  who 
received  accuracy  training.  That  is,  an  interaction  effect  was 
predicted  such  that  no  differences  between  groups  would  exist 
during  the  first  phase,  Improvement  would  occur  for  both  groups 
between  Phase  One  and  Phase  Three  assessments,  and  Improvement  in 
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Table  4 

Num  and  Standard  Devi* cion*  for  Cognitive  Variable*  by  Training  and  Experience 


Training  x 

Experience 

Exnerience 

Accuracy  (n«30) 

Feedback  (n* 

■23) 

Totals 

(n-53) 

Cognitive  Variable 

Lou 

High 

Total 

Low 

High 

Total 

Low 

High 

Rating  Scale  Match 

X 

87.27 

105.50 

96.37 

89.35 

105.70 

97.54 

88.17 

105.60 

SO 

16.57 

8.01 

15.83 

14.24 

9.78 

14.65 

15.49 

8.73 

Behavior  Classification 

Non- Job 

X 

4.50 

2.97 

3.73 

5.61 

1.39 

3.50 

4.98 

2.28 

SO 

4.75 

2.97 

4.00 

4.10 

2.13 

3.87 

4.47 

2.73 

Job 

X 

2.33 

0.67 

1.50 

2.83 

0.57 

1.70 

2.55 

0.62 

SD 

2.62 

0.92 

2.12 

3.93 

0.66 

3.01 

3.22 

0.81 

Cognitive  Differentiation 

Overall 

X 

73.80 

53.97 

63.88 

82.17 

48.00 

65.09 

77.43 

51.38 

SD 

38.62 

12.00 

30.07 

27.88 

8.97 

26.79 

34.33 

11.10 

Job 

X 

42.70 

29.70 

36.20 

49.39 

25.48 

37.43 

45.60 

27.87 

SD 

20.88 

8.07 

17.01 

19.59 

5.81 

18.71 

20.41 

7.42 

Non- Job 

X 

31.10 

24.27 

27.68 

32.78 

22.52 

27,65 

31.83 

23.51 

SD 

19.23 

4.77 

14.31 

9.30 

3.73 

8.71 

15.61 

4.39 
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Table  5 

Results  of  Analyses  of  Variance  for  Cognitive  Measures 


(table  continued) 


Cognitive  Differentiation 
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categories  would  be  greater  for  those  who  received  feedback 
training  as  compared  to  accuracy  training.  As  reported  earlier,  a 
significant  interaction  was  found  when  all  six  cognitive  variables 
were  used  in  the  MANOVA.  However,  the  univariate  analyses  revealed 
only  marginally  significant  interaction  effects  for  two  of  the  six 
variables . 

Given  the  significant  interaction  for  the  MANOVA  and  our 
Interest  in  comparisons  between  training  conditions,  comparisons 
between  training  programs  were  conducted  on  the  Phase  Three  data 
only.  Three  of  the  six  comparisons  were  significant  (Non-* Job 
Behavior  Classification,  F(l,52)  -  4.65,  £  -  .04;  Overall  Cognitive 
Differentiation,  F(l,52)  -  3.97,  £  •  .05;  Job  Cognitive 
Differentiation,  F(l,52)  ■  4.50,  £-  .04).  In  these  cases, 
training  with  Individualized  feedback  and  a  discussion  of  cognitive 
categories  in  rating  created  more  beneficial  responses  on  cognitive 
variables  than  did  accuracy  training  alone  (Omega  squares  for  the 
three  variables  were  .06,  .05,  .06,  respectively). 

One  additional  descriptive  feature  of  the  cognitive  data 
deserves  mention.  For  all  dependent  variables,  the  variances  of 
the  scores  were  less  in  Phase  Three  than  Phase  One  and  for  five  of 
the  six  variables,  the  variances  were  less  with  feedback  training 
than  accuracy  training  (see  Table  4).  The  pattern  of  variances 
provides  some  additional  evidence  for  the  positive  effect  of 
training  experience  on  raters'  cognitive  categories  and  for  the 
advantage  of  providing  specific  feedback.^ 
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Discussion 

Although  previous  research  on  rater  accuracy  training  has 
demonstrated  that  such  training  can  lead  to  more  accurate 
appraisals  (Bemardin  &  Pence,  1980;  McIntyre,  et  al.,  1984; 
Pulakos,  1984,  in  press),  the  research  has  provided  little 
information  about  how  accuracy  training  actually  improves  rating 
accuracy.  The  lack  of  such  information  limits  the  usefulness  of 
the  training  due  to  a  paucity  of  data  on  the  factors  that  influence 
accuracy.  Greater  knowledge  about  what  and  how  variables  impact 
accuracy  would  provide  guidance  for  the  development  of  future 
accuracy  training. 

One  of  the  most  prevalent  explanations  for  how  accuracy 
training  affects  ratings  is  through  its  effect  on  the  way  in  which 
raters  organize  and  store  Information  about  ratees  in  memory.  Yet, 
this  explanation  has  been  based  primarily  on  indirect  Inferences 
from  the  social  cognition  literature,  rather  than  from  research 
directly  addressing  performance  appraisals  (Ilgen  &  Favero,  1985; 
Ostroff  &  Ilgen,  1985).  The  present  study  provides  more  direct 
support  for  the  Influence  of  training  which  is  directed  toward 
Improving  rating  accuracy  on  the  cognitive  categories  used  by 
raters. 

This  study  replicated  previous  findings  that  rater  accuracy 
training  actually  improves  the  accuracy  with  which  raters  evaluate 
others'  performance.  Two  types  of  training  programs  were 
utilized — standard  accuracy  training  and  feedback  training  which 
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Incorporated  giving  personalized  feedback  on  categories  and  a 
discussion  of  the  effect  of  categories  in  the  rating  task.  Both 
training  programs  significantly  increased  rating  accuracy  from  pre 
to  post-training  experience. 

Due  to  the  fact  that  raters  evaluated  the  ratee's  performance 
twice,  once  during  pre-training  and  once  post-training,  it  was 
possible  that  experience  or  practice  in  rating,  rather  than 
training,  was  the  factor  leading  to  Increased  accuracy  scores. 
While  this  explanation  is  plausible,  there  is  some  data  to  counter 
this  argument.  Raters  who  did  not  have  previous  pre-training 
experience,  but  received  feedback  training,  had  mean  accuracy 
scores  which  were  similar  to  those  who  attended  both  sessions.  If 
experience  alone  was  the  explanation,  then  raters  without  prior 
experience  should  have  had  accuracy  scores  which  were  less  than 
those  with  experience.  This  was  not  the  case  for  those  given 
feedback  training  only,  but  interestingly,  this  did  occurr  for 
those  raters  who  received  only  accuracy  training. 

Raters  who  received  only  accuracy  training  had  accuracy  scores 
which  were  less  than  the  post-training  accuracy  scores  of  raters 
who  participated  in  both  sessions,  and  their  mean  accuracy  scores 
were  similar  to  those  of  the  pre-training  accuracy  scores. 

Further,  for  those  who  particpated  in  only  the  training  session, 
those  who  received  feedback  training  were  more  accurate  than  those 
who  received  accuracy  training.  Taken  together,  these  results 
imply  that  feedback  training  has  a  stronger  effect  on  accuracy 
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scores,  but  this  may  be  moderated  by  the  amount  of  time,  or 
experience,  spent  in  training.  Raters  who  attended  both  sessions 
spent  a  total  of  three  hours  in  training,  but  only  one-half  hour  of 
time  in  these  sessions  reflected  differences  between  accuracy  and 
feedback  training.  For  those  who  only  attended  the  training 
session,  one-half  out  of  one  and  one-half  hours  differentiated 
feedback  from  accuracy  training.  Here,  the  ratio  of  time  spent 
emphasizing  feedback,  rather  than  accuracy,  training  was  greater 
and  thus  the  differences  in  the  training  programs  may  not  have  been 
swamped  by  the  other  information  presented.  Our  suggestion  is  that 
when  only  a  one-time  training  program  is  implemented,  feedback 
training  appears  to  be  a  superior  strategy  when  discussion  of  the 
effects  of  categories  on  the  rating  task  is  incorporated  in 
training. 

One  question  which  arises  concerns  what  component  of  the 
feedback  training  led  to  increases  in  accuracy.  Originally,  we 
believed  that  providing  raters  with  personalized  feedback  about 
their  category  systems  would  serve  to  increase  rating  accuracy. 
However,  those  raters  who  did  not  participate  in  the  pre-training 
session,  but  attended  the  feedback  training  session,  did  not 
receive  personalized  feedback  about  their  categories  (the  feedback 
was  derived  from  measures  completed  during  the  first  session). 

?et,  even  in  the  absence  of  personalized  feedback,  these  raters 
were  equally  as  accurate  as  those  who  attended  both  sessions,  and 
were  more  accurate  than  those  who  only  received  accuracy  training. 
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Therefore,  it  is  likely  that  the  discussion  which  focused 
specifically  on  the  effects  of  categories  in  the  rating  process  led 
to  the  increased  accuracy,  rather  than  the  feedback  per  se.  This 
discussion  nay  have  provided  valuable  information  to  raters  about 
the  rating  task. 

One  of  the  unique  aspects  of  this  study  was  the 
investigation  of  and  finding  that  training  does  influence 
categories,  since  studies  using  the  accuracy  apporach  have 
implicitly  assumed  this  link.  Based  on  the  findings  here,  it  is 
apparent  that  training  did,  in  fact,  influence  raters'  cognitive 
categories  when  assessments  of  the  categories  were  made 
approximately  one  month  prior  to  and  one  month  after  training. 

Both  training  programs  had  a  positive  effect  on  raters'  categories. 
Further,  for  some  of  the  cognitive  category  indices.  Feedback 
training,  which  focused  on  categories,  had  a  greater  effect  than 
Accuracy  training.  Training  programs  which  directly  focus  on 
cognitive  processes  by  providing  individualized  feedback  to  raters 
about  their  own  category  systems  and  by  discussing  the  role  of 
categories  in  the  rating  process  may  make  it  easier  for  them  to 
Identify  and  alter  their  idiosyncratic  category  systems  and  thus 
have  a  greater  impact  on  the  categories. 

Although  the  effects  of  any  of  the  variables  in  the  study  on 
appraisal  accuracy  were  not  very  strong,  they  were  relatively 
consistent  with  much  of  the  accuracy  research  that  uses  the 
experimental  design  used  here— that  is,  one  in  which  raters  view 
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videotapes  with  known  standards  of  behaviors.  Videotapes  can  be 
used  for  the  standards  of  performance  only  if  there  is  high 
agreement  among  expert  judges  about  the  behavior  displayed  on  the 
tape.  Without  high  agreement,  the  standard  for  judging  accuracy  is 
not  well  defined.  The  result  of  this  high  agreement  is  that  the 
final  stimulus  tape  may  contain  obvious  behaviors  which  allows 
naive  subjects  to  be  quite  accurate  in  their  judgments.  This  was 
the  case  in  the  present  study  and  we  suspect  in  many  other  studies 
employing  this  paradigm.  We  would  expect  that  our  findings  with 
respect  to  accuracy  and  the  findings  of  others  may  be  conservative. 
That  is,  in  job  settings  with  more  abstract  behaviors,  the  effects 
shoyild  be  stronger.  However,  there  is  a  need  to  seek. other 
paradigms  for  accuracy  research  in  order  to  replicate  these  and 
other  findings  on  accuracy. 

The  model  used  for  rater  training  assumes  that  training 
affects  cognitive  categories  which,  in  turn,  affect  accuracy. 
Results  discussed  thus  far  have  indicated  that  training  affects 
categories  and  training  affects  accuracy.  Categories  of  raters 
have  also  been  shown  to  be  related  to  accuracy,  but  these  measures 
were  derived  prior  to  training  (Ostroff  &  Ilgen,  1985).  Thus, 
additional  correlational  analyses  were  performed  to  determine  if 
post-training  categories  of  raters  were  related  to  rating  accuracy 
after  training.  Non-Job  Cognitive  Differentiation  and  Rating  Scale 
Match  were  slgnltlfcantly  related  to  overall  accuracy  (r  ■  -.27, 

£  “  .03  and  £  ■  -.40,  £  ■  .002,  respectively). 
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It  was  also  possible  to  use  this  data  to  examine  the  mediating 
effect  of  categories  on  the  relationship  between  training  and 
accuracy.  Training  was  coded  as  a  dichotomous  variable  as  pre- 
training  versus  post-training  and  was  correlated  with  raters' 
overall  accuracy  scores  (£  -  -.27,  £  -  .004).  When  the  effects  of 
the  six  cognitive  category  measures  were  controlled  for  in  a 
partial  correlation,  the  correlation  between  training  and  accuracy 
was  reduced  (ir  ■  -.18,  £■  .04).  Although  this  test  was  not 
optimal  as  the  repeated  measures  scores  were  used  independently  in 
the  correlation  and  hence  the  sample  size  was  doubled,  it  provided 
some  means  to  test  for  the  mediating  effect.  It  appears  that 
cognitive  categories  have  some  mediating  effect,  but  there  is  still 
a  significant  direct  relationship  between  training  and  accuracy. 

It  is  reasonable  to  assume  that  while  cognitive  categories  do  have 
some  effect  on  rating  accuracy,  other  factors  enter  into  this 
process. 

When  considering  these  results,  it  is  important  to  remember 
that  the  pre-training  scores  were  not  a  true  "control"  by  which  to 
compare  post-training  scores;  some  knowledge  about  rating  may  have 
been  gained  during  the  first  session  prior  to  assessment  of  the 
pre-training  accuracy  scores.  Thus,  these  relationships  may  be 
underestimates  of  the  true  effects,  if  a  control  group  with  no 
prior  experience  was  used. 

Conclusion.  Taken  together,  the  findings  presented  here 
indicate  that  rater  training  should  be  expanded  to  include 
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components  that  concentrate  directly  on  raters'  cognitive 
categories  for  appraising  performance.  The  present  study 
incorporated  only  one  component  of  raters'  cognitive  processes , 
namely  the  storing  of  Information  into  categories.  Future  research 
on  rater  training  programs  could  concentrate  on  observational 
skills  and  recall  processes  of  raters  to  fully  incoporated  the 
cognitive  processes  of  raters  into  training  programs.  In  addition, 
most  studies  investigating  the  effects  of  rater  training  have  been 
lab  studies  using  undergraduate  students  as  raters  and  not  the 
actual  persons  who  make  evaluations  of  others.  Because  this  study 
was  conducted  in  a  field  setting  using  "real  world"  people  as 
subjects,  it  is  evident  that  rater  training  can  be  sucessfully 
applied  in  organizational  settings. 
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Footnotes 

^Additional  data,  not  relevant  to  the  focus  of  this  study  were 
collected  during  this  phase.  The  results  of  these  are  reported  in 

Ostroff  &  II gen  (1985). 

2 

Although  the  differences  in  variance  suggest  a  lack  of 
homogeneity  of  variance  across  treatment  conditions,  the 
proportionality  of  the  cell  means  and  the  fact  that  ANOVAs  are 
quite  robust  to  violations  in  homogeneity  of  variance  when 
proportionality  exists  (Winer,  1971)  suggests  that  the  ANOVAs  are 
approproprlate  analyses. 
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